Check with smaller windows

Overview: Script for water salinity data from water stations. Sources and information on data below. First I remove data points that failed the QC test. Then I use a rolling mean to remove data points +/- 2 standard deviations from the rolling mean (still figuring out what window of time makes the most sense). Finally I calculate hourly mean since data was collected at different frequencies. This will set me up to do analysis between sites and with field data. Sites graphed from cloest to the delta to the mouth of the bay. If this file is too large to upload onto GitHub I may split into multiple markdowns for each parameter.

Notes on data:

-Accessed and downloaded all data on September 30, 2020 -Downloaded all data available from 1/1/2017-12/31/2019. -China camp http://cdmo.baruch.sc.edu/aqs/zips.cfm Data logged every 15 minutes. Requested citation format: NOAA National Estuarine Research Reserve System (NERRS). System-wide Monitoring Program. Data accessed from the NOAA NERRS Centralized Data Management Office website: http://cdmo.baruch.sc.edu/; accessed 30 September 2020. -Richardson Bay/NERR data. Data logged every 15minutes. -EOS https://oceanview.pfeg.noaa.gov/erddap/tabledap/rtcctdRTCysi.html. Data logged every 6 minutes. time in UTC. Should I use this link? http://erddap.cencoos.org/erddap/tabledap/tiburon-water-tibc1.html. ph and salinity available. No air temperature. -Point Fort http://erddap.cencoos.org/erddap/tabledap/fort-point.html?time%2Csea_water_temperature%2Csea_water_temperature_qc_agg%2Cz&time%3E%3D2020-09-20T21%3A07%3A41Z&time%3C%3D2020-09-29T12%3A00%3A00Z. time in UTC. Data logged every 6 minutes.

Next steps: -fix the aesthetics of the graphs. Font too small, change gradient, clean up keys, ect -analysis with field data -site characterization summary table

Set up

rm(list=ls())
library(tidyverse)
library(ggpubr)
library(scales)
library(chron)
library(plotly)
library(taRifx)
library(aweek)
library(easypackages)
library(renv)
library(here)
library(ggthemes)
library(gridExtra)
library(patchwork)
library(tidyquant)
library(recipes) 
library(cranlogs)
library(knitr)
library(openair)

Make a custon function to calculate mean, standard deviation (sd), 95% confidence interval where x is a numeric vector. Here “hi” and “lo” are defined as 2 strandard deviations away from mean. This creates a functin so you don’t need to change anything here/this step does not manipulate the data. You can change the equation to define the hi and low as 3 or however many sds away you would like. na.rm=TRUE removes any “NA” values. “ret” are the new columns it’ll add when you run this on a df.

custom_stat_fun_2<-function(x, na.rm=TRUE){
  m<-mean(x, na.rm=TRUE)
  s<- sd(x, na.rm=TRUE)
  hi<-m+2*s
  lo<-m-2*s
  ret<-c(mean=m, stdev=s, hi.95=hi, lo.95=lo)
}

Salinity

China Camp Read in data. Subset (not needed but it’s a large df). Needs a column named “date” with no time stamp for step later.

cc<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/china_camp_all.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)

cc$date<-as.POSIXct(cc$DateTimeStamp, format = c("%m/%d/%Y"))

cc$datetime<-as.POSIXct(cc$DateTimeStamp, format = c("%m/%d/%Y %H:%M"))

cc<-subset(cc, select=c(date, datetime, Sal, F_Sal))

cc%>%
  ggplot(aes(x=datetime, y=Sal, color=Sal))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from China Camp: All data points",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 847 rows containing missing values (geom_point).

Remove data that failed QC test

cc<-cc[- grep("-3", cc$F_Sal),]

cc%>%
  ggplot(aes(x=date, y=Sal, color=Sal))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from China Camp: Failed QC points removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 835 rows containing missing values (geom_point).

Still some points I’m not sure about so I’m going to remove data points flagged as “suspect” too

cc<-cc[- grep("1", cc$F_Sal),]

cc%>%
  ggplot(aes(x=date, y=Sal, color=Sal))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from China Camp: Failed & suspect QC points removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 835 rows containing missing values (geom_point).

I like this much more.

Roll apply using custom stat function. Needs a column named “date” with no time stamp in order to work. Select = what paramter/column you want to apply the rolling average to, in this case we’re interested in salinity. FUN= is the custom function created at the beginning. Width of window depends on frequencey of data collection. Data collected every 15 min so width = 672 is 7 days, 1344 is 14 days, 2880 is 30 days. 96 observations per day. 8=2hours.

2 hour

cc<-cc%>%
  tq_mutate(
    select= Sal,
    mutate_fun= rollapply,
    width= 8,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique

Filter to remove calues that are +/- 2 standard deviations from the rolling mean

cc<-filter(cc, Sal <hi.95 & Sal >lo.95)

cc%>%
  ggplot(aes(x=datetime, y=Sal, color=Sal))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from China Camp: Failed QC points removed + +/- 2sd removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 7 rows containing missing values (geom_point).

Hourly median For timeAverage function to work, you need a date or POSIXct column named “date” but it needs the timestamp to work so we’ll add it back in. Interval = frequency of data collection.

cc<-subset(cc, select=-c(date))
cc$date<-as.POSIXct(cc$datetime, format = c("%m/%d/%Y %H:%M"))

cc.1hr.med<- timeAverage(cc,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "15 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 7 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
cc_sal<-cc.1hr.med %>% ggplot(aes(x=datetime, y=Sal, color=Sal))+
  geom_point(alpha=0.5)+ylim(0,35)+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Hourly median salinity from China Camp",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")

cc_sal
## Warning: Removed 4563 rows containing missing values (geom_point).

EOS pier

Set up

eos<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/eos_bouy_r.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)

eos$date<-as.Date(eos$date, format = c("%m/%d/%Y"))
eos$datetime<-as.POSIXct(eos$datetime, format = c("%Y-%m-%dT%H:%M:%SZ"))

eos<-subset(eos, select=c(date, datetime, salinity))

eos%>%
  ggplot(aes(x=datetime, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Plot with better view

eos%>%
  ggplot(aes(x=datetime, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+ylim(0,35)+
  labs(title="Salinity data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 2 rows containing missing values (geom_point).

Since this data set doesn’t have a QC test column, I’ll remove values that are far outside expected salinity values in Tiburon.

eos<-filter(eos, salinity >0 & salinity <40)

eos%>%
  ggplot(aes(x=datetime, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from EOS resesarch pier: Values outside of expected range removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Not sure about the pocket of lower salinity values at the end of 2018. Need to verify. A few other points stand out but they should be removed with the next steps.

Roll apply using custom stat function. Data collected every 6 minutes so width=1680 is 7 days, 3360 is 14 days, 7200 is 30 days. 240 observations per day. 20=2 hours. Needs a column named “date” with no time stamp in order to work.

2hr

eos<-eos%>%
  tq_mutate(
    select= salinity,
    mutate_fun= rollapply,
    width= 20,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique

Filter and graph

eos<-filter(eos, salinity <hi.95 & salinity >lo.95)


ggplot(data = eos, mapping = aes(x = datetime, y = salinity, color=salinity)) + geom_point()+
  labs(title="Salinity data from EOS resesarch pier: Extreme + +/- 2sd data removed ",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Looks better. Need to verify that low salinity pocket at the end of 2018.

Hourly median

eos<-subset(eos, select=-c(date))
names(eos)[names(eos) == "datetime"] <- "date"
eos$date<-as.POSIXct(eos$date, format = c("%Y-%m-%dT%H:%M:%SZ"))

eos.1hr.med<- timeAverage(eos,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "6 min",
            remove.calm = FALSE,
            type = "default")
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
eos_sal<-eos.1hr.med %>% ggplot(aes(x=date, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+ylim(0,35)+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Hourly median salinity from EOS resesarch pier",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
eos_sal
## Warning: Removed 314 rows containing missing values (geom_point).

Richardson Bay

Set up

rb<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/richardson_bay_all.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)

rb$date<-as.Date(rb$DateTimeStamp, format = c("%m/%d/%y"))
rb$datetime<-as.POSIXct(rb$DateTimeStamp, format = c("%m/%d/%y %H:%M"))

rb<-subset(rb, select=c(date, datetime, Sal, F_Sal))

rb%>%
  ggplot(aes(x=datetime, y=Sal, color=Sal))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from Richardson Bay: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).

Remove flagged data that did not pass QAQC. F_sal with a -3 indicates fail

rb<-rb[- grep("-3", rb$F_Sal),]


rb%>%
  ggplot(aes(x=datetime, y=Sal, color=Sal))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from Richardson Bay: Failed QC data removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).

Not sure about that low salinity pocket at the end of 2017 so I’m going to remove data points flagged as “suspect” and see if that helps.

rb<-rb[- grep("1", rb$F_Sal),]

rb%>%
  ggplot(aes(x=datetime, y=Sal, color=Sal))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from Richardson Bay: Failed & suspect QC data removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).

Looks better and got rid of most of the event that I wasn’t sure about. Still some points that I dont’ like. Hopefully the next step helps.

Roll apply using custom stat function. Data collected every 15 min so width = 672 is 7 days, 1344 is 14 days, 2880 is 30 days. 96 observations per day. 2hrs=8

rb<-rb%>%
  tq_mutate(
    select= Sal,
    mutate_fun= rollapply,
    width= 8,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique

Filter

rb<-filter(rb, Sal <hi.95 & Sal >lo.95)

rb%>%
  ggplot(aes(x=datetime, y=Sal, color=Sal))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from Richardson Bay: Failed & suspect QC data + -/+2sd removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 11 rows containing missing values (geom_point).

CHECK: Removed that stream of lower salinity data at the end of 2017/begining of 2018. Need to verify with Matt Ferner. It’s removed with a rolling window larger than 2hrs but I want to make sure I’m not removing any real events/data.

Hourly median

rb<-subset(rb, select=-c(date))
rb$date<-as.POSIXct(rb$datetime, format = c("%m/%d/%y %H:%M"))
rb<-subset(rb, select=-c(datetime))

rb.1hr.med<- timeAverage(rb,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "15 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 11 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
rb_sal<-rb%>%
  ggplot(aes(x=date, y=Sal, color=Sal))+
  geom_point(alpha=0.5)+ylim(0,35)+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Hourly median salinity data from Richardson Bay",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
rb_sal
## Warning: Removed 11 rows containing missing values (geom_point).

Fort Point Set up. For some reason I was getting an error later when I converted the date and datetime column to date columns so I’m only doing one here unlike the other sites but I’m conver the other ones later. The method is still consistent with the other sites.

fp<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/fort_point_r.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)
 
fp$date<-as.Date(fp$date, format = c("%Y-%m-%d"))
names(fp)[names(fp) == "sea_water_practical_salinity"] <- "salinity"
names(fp)[names(fp) == "sea_water_practical_salinity_qc_agg"] <- "F_Sal"

fp<-subset(fp, select=c(datetime, date, salinity, F_Sal))

fp%>%
  ggplot(aes(x=date, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from Fort Point: All Data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Remove flagged data, 4 indicates fail.

fp<-filter(fp, F_Sal!=4)

fp%>%
  ggplot(aes(x=date, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from Fort Point: Failed QC data removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Still a lot of points I don’t like. 2 hour window did not remove enough outliers. I’m going to also remove points that are flagged as “suspect” and see if it helps.

fp<-filter(fp, F_Sal!=3)

fp%>%
  ggplot(aes(x=date, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from Fort Point: Failed & suspect QC data removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

This already looks much better. It still has those points around 0 which just do not look right to me so I’m going to pick them out for now but I’d like a cleaner method if possible. These points weren’t removed with the 2hr window and I’d rather pull them out here so I dont’ scew the next few steps.

fp<-filter(fp, salinity >1)

fp%>%
  ggplot(aes(x=date, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from Fort Point: Failed & suspect QC + +/-2sd + <1 removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Must better. Still two points I don’t like but should be removed in the next step.

Roll apply using custom stat function. Data collected every 6 minutes so width=1680 is 7 days, 3360 is 14 days, 7200 is 30 days. 240 observations per day. 2hrs = 20 (doesn’t remove enough).

This is where I was getting an error when I had both the date and datetime column converted so I just converted the date column since that’s what used at this step and will convert the other one when I use it. It didn’t do this before so I’m not sure what’s changed.

rm(fp.2hr.window)
## Warning in rm(fp.2hr.window): object 'fp.2hr.window' not found
fp<-fp%>%
  tq_mutate(
    select= salinity,
    mutate_fun= rollapply,
    width= 20,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique
fp<-filter(fp, salinity <hi.95 & salinity >lo.95)

fp%>%
  ggplot(aes(x=date, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from Fort Point: Failed & suspect QC + +/-2sd + <1 removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Hourly median

fp<-subset(fp, select=-c(date))
fp$date<-as.POSIXct(fp$datetime, format = c("%Y-%m-%dT%H:%M:%SZ"))

fp.1hr.med<- timeAverage(fp,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "6 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 27 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
fp_sal<-ggplot(data = fp.1hr.med, mapping = aes(x = date, y = salinity, color=salinity)) + geom_point()+ylim(0,35)+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Hourly median salinity data from Fort Point",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
fp_sal
## Warning: Removed 3125 rows containing missing values (geom_point).

Salinity graphs together

all_sal<-ggarrange(cc_sal, eos_sal, rb_sal, fp_sal, nrow=4, ncol=1)
## Warning: Removed 4563 rows containing missing values (geom_point).
## Warning: Removed 314 rows containing missing values (geom_point).
## Warning: Removed 11 rows containing missing values (geom_point).
## Warning: Removed 3125 rows containing missing values (geom_point).
all_sal

Still need to verify a few events but overall this is looking goo.

Site characteristics

How many days had at least one hourly median reading of salinity below 10? Using hourly median data

China Camp If salinity is less than 10 (lt10) = “low”. Make “day” column, subset to only get rows with “low” salinity events, remove duplicated days to only get the number of days

cc.1hr.med$lt10<-ifelse(cc.1hr.med$Sal <10, "low", "NA")

cc.1hr.med$date<-as.Date(cc.1hr.med$date, c(format="%Y-%m-%d"))
## Warning in as.POSIXlt.POSIXct(x, tz = tz): unknown timezone '%Y-%m-%d'
cc.low.event<-subset(cc.1hr.med, cc.1hr.med$lt10 == "low")

cc.low.event<-cc.low.event[!duplicated(cc.low.event$date),]

191

Tiburon

names(eos.1hr.med)[names(eos.1hr.med) == "date"] <- "datetime"
eos.1hr.med$date<-as.Date(eos.1hr.med$datetime, c(format="%Y-%m-%d"))
## Warning in as.POSIXlt.POSIXct(x, tz = tz): unknown timezone '%Y-%m-%d'
eos.1hr.med$lt10<-ifelse(eos.1hr.med$salinity <10, "low", "NA")

eos.low.event<-subset(eos.1hr.med, eos.1hr.med$lt10 == "low")

eos.low.event<-eos.low.event[!duplicated(eos.low.event$date),]

180

Richardson Bay

names(rb.1hr.med)[names(rb.1hr.med) == "date"] <- "datetime"
rb.1hr.med$date<-as.Date(rb.1hr.med$datetime, c(format="%Y-%m-%d"))
## Warning in as.POSIXlt.POSIXct(x, tz = tz): unknown timezone '%Y-%m-%d'
rb.1hr.med$lt10<-ifelse(rb.1hr.med$Sal <10, "low", "NA")

rb.low.event<-subset(rb.1hr.med, rb.1hr.med$lt10 == "low")

rb.low.event<-rb.low.event[!duplicated(rb.low.event$date),]

19

Fort Point

names(fp.1hr.med)[names(fp.1hr.med) == "date"] <- "datetime"
fp.1hr.med$date<-as.Date(fp.1hr.med$datetime, c(format="%Y-%m-%d"))
## Warning in as.POSIXlt.POSIXct(x, tz = tz): unknown timezone '%Y-%m-%d'
fp.1hr.med$lt10<-ifelse(fp.1hr.med$salinity <10, "low", "NA")

fp.low.event<-subset(fp.1hr.med, fp.1hr.med$lt10 == "low")

fp.low.event<-fp.low.event[!duplicated(fp.low.event$date),]

27

How many days had at least one hourly median reading of salinity below 10: China Camp 191 EOS 180 Richardson Bay 19 Fort Point 27

How many days had at least one hourly median reading of salinity below 20? Using hourly median data

China Camp

cc.1hr.med$lt20<-ifelse(cc.1hr.med$Sal <20, "low", "NA")

cc.low.event<-subset(cc.1hr.med, cc.1hr.med$lt20 == "low")

cc.low.event<-cc.low.event[!duplicated(cc.low.event$date),]

493

Tiburon

eos.1hr.med$lt20<-ifelse(eos.1hr.med$salinity <20, "low", "NA")

eos.low.event<-subset(eos.1hr.med, eos.1hr.med$lt20 == "low")

eos.low.event<-eos.low.event[!duplicated(eos.low.event$date),]

400

Richardson Bay

rb.1hr.med$lt20<-ifelse(rb.1hr.med$Sal <20, "low", "NA")

rb.low.event<-subset(rb.1hr.med, rb.1hr.med$lt20 == "low")

rb.low.event<-rb.low.event[!duplicated(rb.low.event$date),]

154

Fort Point

fp.1hr.med$lt20<-ifelse(fp.1hr.med$salinity <20, "low", "NA")

fp.low.event<-subset(fp.1hr.med, fp.1hr.med$lt20 == "low")

fp.low.event<-fp.low.event[!duplicated(fp.low.event$date),]

173

How many days had at least one hourly median reading of salinity below 20: China Camp 493 EOS 400 Richardson Bay 154 Fort Point 173

How many days with a daily median below salinity 10? Using the data with the outliers removed (flagged and +/-2sd) and then calculate daily median. Need a column named “date” with a timestamp for this to work. Using “day” as the measurement instead of 24 hours

China Camp

cc.daily.med<- timeAverage(cc,
            avg.time="1 day",
            data.thresh = 0,
            statistic = "median",
            interval = "15 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 7 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
cc.daily.med$lt10<-ifelse(cc.daily.med$Sal <10, "low", "NA")

cc.low.event<-subset(cc.daily.med, cc.daily.med$lt10 == "low")

cc.low.event<-cc.low.event[!duplicated(cc.low.event$date),]

153

EOS

eos.daily.med<- timeAverage(eos,
            avg.time="1 day",
            data.thresh = 0,
            statistic = "median",
            interval = "6 min",
            remove.calm = FALSE,
            type = "default")
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
eos.daily.med$lt10<-ifelse(eos.daily.med$salinity <10, "low", "NA")

eos.low.event<-subset(eos.daily.med, eos.daily.med$lt10 == "low")

eos.low.event<-eos.low.event[!duplicated(eos.low.event$date),]

69

Richardson Bay

rb.daily.med<- timeAverage(rb,
            avg.time="1 day",
            data.thresh = 0,
            statistic = "median",
            interval = "15 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 11 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
rb.daily.med$lt10<-ifelse(rb.daily.med$Sal <10, "low", "NA")

rb.low.event<-subset(rb.daily.med, rb.daily.med$lt10 == "low")

rb.low.event<-rb.low.event[!duplicated(rb.low.event$date),]

12

Fort Point

fp.daily.med<- timeAverage(fp,
            avg.time="1 day",
            data.thresh = 0,
            statistic = "median",
            interval = "6 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 27 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
fp.daily.med$lt10<-ifelse(fp.daily.med$salinity <10, "low", "NA")

fp.low.event<-subset(fp.daily.med, fp.daily.med$lt10 == "low")

fp.low.event<-fp.low.event[!duplicated(fp.low.event$date),]

9

How many days had a daily median salinity below 10: China Camp 153 EOS 69 Richardson Bay 12 Fort Point 9

How many days with a daily median below salinity 20? Using the data with the outliers removed (flagged and +/-2sd) and then calculate daily median. Need a column named “date” with a timestamp for this to work.

China Camp

cc.daily.med$lt20<-ifelse(cc.daily.med$Sal <20, "low", "NA")

cc.low.event<-subset(cc.daily.med, cc.daily.med$lt20 == "low")

cc.low.event<-cc.low.event[!duplicated(cc.low.event$date),]

375

EOS

eos.daily.med$lt20<-ifelse(eos.daily.med$salinity <20, "low", "NA")

eos.low.event<-subset(eos.daily.med, eos.daily.med$lt20 == "low")

eos.low.event<-eos.low.event[!duplicated(eos.low.event$date),]

258

Richardson Bay

rb.daily.med$lt20<-ifelse(rb.daily.med$Sal <20, "low", "NA")

rb.low.event<-subset(rb.daily.med, rb.daily.med$lt20 == "low")

rb.low.event<-rb.low.event[!duplicated(rb.low.event$date),]

142

Fort Point

fp.daily.med$lt20<-ifelse(fp.daily.med$salinity <20, "low", "NA")

fp.low.event<-subset(fp.daily.med, fp.daily.med$lt20 == "low")

fp.low.event<-fp.low.event[!duplicated(fp.low.event$date),]

60

How many days had a daily median salinity below 20: China Camp 426 EOS 258 Richardson Bay 142 Fort Point 60

All together Will need to format better eventually but this is just a data dive.

How many days had at least one hourly median reading of salinity below 10: China Camp 191 EOS 180 Richardson Bay 19 Fort Point 27

How many days had at least one hourly median reading of salinity below 20: China Camp 493 EOS 400 Richardson Bay 154 Fort Point 173

How many days had a daily median salinity below 10: China Camp 153 EOS 69 Richardson Bay 12 Fort Point 9

How many days had a daily median salinity below 20: China Camp 426 EOS 258 Richardson Bay 142 Fort Point 60

Interesting that you see the idea of Richardson Bay being a salinity refudge when looking at hourly median data but not with the daily median data Possibly the influence of the the open ocean dominates the daily median at Fort Point. Certainly still something interesting happening at Richardson Bay.

Daily Median Graphing out of curiousity

a<-cc.daily.med %>% ggplot(aes(x=date, y=Sal, color=Sal))+
  geom_point(alpha=0.5)+ylim(0,35)+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Daily median salinity from China Camp",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")

b<-eos.daily.med %>% ggplot(aes(x=date, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+ylim(0,35)+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Daily median salinity from EOS resesarch pier",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

c<-rb.daily.med %>% ggplot(aes(x=date, y=Sal, color=Sal))+
  geom_point(alpha=0.5)+ylim(0,35)+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Daily median salinity from Richardson Bay",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")

d<-fp.daily.med %>% ggplot(aes(x=date, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+ylim(0,35)+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Daily median salinity from Fort Point",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

ggarrange(a,b,c,d, nrow=4, ncol=1)
## Warning: Removed 164 rows containing missing values (geom_point).
## Warning: Removed 4 rows containing missing values (geom_point).
## Warning: Removed 200 rows containing missing values (geom_point).
## Warning: Removed 80 rows containing missing values (geom_point).

Not sure this is informative but I was curious.

Now how can I see consecuative days? Like how many events <10 with at least 2days in a row? https://predictivehacks.com/count-the-consecutive-events-in-r/